Non-parametric Bayesian modelling of digital gene expression data

机译：数字基因表达数据的非参数贝叶斯建模

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next-generation sequencing technologies provide a revolutionary tool forgenerating gene expression data. Starting with a fixed RNA sample, theyconstruct a library of millions of differentially abundant short sequence tagsor "reads", which constitute a fundamentally discrete measure of the level ofgene expression. A common limitation in experiments using these technologies isthe low number or even absence of biological replicates, which complicates thestatistical analysis of digital gene expression data. Analysis of this type ofdata has often been based on modified tests originally devised for analysingmicroarrays; both these and even de novo methods for the analysis of RNA-seqdata are plagued by the common problem of low replication. We propose a novel,non-parametric Bayesian approach for the analysis of digital gene expressiondata. We begin with a hierarchical model for modelling over-dispersed countdata and a blocked Gibbs sampling algorithm for inferring the posteriordistribution of model parameters conditional on these counts. The algorithmcompensates for the problem of low numbers of biological replicates byclustering together genes with tag counts that are likely sampled from a commondistribution and using this augmented sample for estimating the parameters ofthis distribution. The number of clusters is not decided a priori, but it isinferred along with the remaining model parameters. We demonstrate the abilityof this approach to model biological data with high fidelity by applying thealgorithm on a public dataset obtained from cancerous and non-cancerous neuraltissues.

机译：下一代测序技术为生成基因表达数据提供了革命性的工具。从固定的RNA样品开始，它们构建了数百万个差异丰富的短序列标签或“读物”的库，它们构成了基因表达水平的根本上离散的度量。使用这些技术的实验中常见的局限性是生物重复数很少甚至没有，这使数字基因表达数据的统计分析变得复杂。这类数据的分析通常基于最初为分析微阵列而设计的改良测试。这些乃至从头开始的用于分析RNA序列数据的方法都受到复制率低的普遍问题的困扰。我们提出了一种新颖的，非参数贝叶斯方法来分析数字基因表达数据。我们从用于建模过度分散的计数数据的分层模型和用于推断以这些计数为条件的模型参数的后验分布的阻塞Gibbs采样算法开始。该算法通过将基因和可能从共同分布中采样的标签计数聚在一起，并使用这种扩增后的样本来估计该分布的参数，从而弥补了生物复制数量少的问题。聚类的数量不是先验确定的，而是与其余模型参数一起推断的。我们通过在从癌性和非癌性神经组织获得的公共数据集上应用算法，证明了这种方法能够以高保真度对生物学数据进行建模的能力。

著录项

作者
Vavoulis, Dimitrios V.; Gough, Julian;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Non-Parametric Bayesian Modelling of Digital Gene Expression Data [J] . Dimitrios V Vavoulis, Julian Gough Journal of Computer Science & Systems Biology . 2014,第1期

机译：数字基因表达数据的非参数贝叶斯建模
2. NPBayes-fMRI: Non-parametric Bayesian General Linear Models for Single- and Multi-Subject fMRI Data [J] . Jeong Hwan Kook, Michele Guindani, Linlin Zhang Statistics in Biosciences . 2019,第1期

机译：NPBAYES-FMRI：用于单次和多主题FMRI数据的非参数贝叶斯通用线性模型
3. Investigating the Effects of Imputation Methods for Modelling Gene NetworksUsing a Dynamic Bayesian Networkfrom Gene Expression Data [J] . Malaysian Journal of Medical Science . 2014,第2期

机译：使用基因表达数据中的动态贝叶斯网络研究插补方法对基因网络建模的效果
4. A non-parametric Bayesian clustering for gene expression data [C] . Wang Liming, Wang Xiaodong 2012 IEEE Statistical Signal Processing Workshop. . 2012

机译：基因表达数据的非参数贝叶斯聚类
5. Classical and Bayesian mixed model analysis of microarray data for detecting gene expression and DNA differences [D] . Demirkale, Cumhur Yusuf. 2009

机译：芯片数据的经典和贝叶斯混合模型分析，用于检测基因表达和DNA差异
6. Empirical Bayesian selection of hypothesis testing procedures for analysis of digital gene expression data [O] . Stan Pounds, Cuilan Gao 2013

机译：用于数字基因表达数据分析的假设检验程序的经验贝叶斯选择
7. Empirical Bayesian selection of hypothesis testing procedures for analysis of digital gene expression data [O] . 2013

机译：用于数字基因表达数据分析的假设检验程序的经验贝叶斯选择

Non-parametric Bayesian modelling of digital gene expression data

摘要

著录项

相似文献

相关主题

期刊订阅